Speaker-specific variability of phoneme durations
نویسندگان
چکیده
The durations of phonemes varies for different speakers. To this end, the correlations between phonemes across different speakers are studied and a novel approach to predict unknown phoneme durations from the values of known phoneme durations for a particular speaker are presented, based on the maximum likelihood criterion. Several interesting patterns are observed. Phonemes from the same broad phonetic class tend to covary most strongly (and therefore intra-class predictions of unknown phoneme durations are most accurate), but significant cross-class correlations are also present. Consequently, knowledge of only a few highly-correlated phonemes’ durations is necessary to make a good duration prediction.
منابع مشابه
Speech Rate Normalization Used to Improve Speaker Verification
A novel approach to speech rate normalization is presented. Models are constructed to model the way in which speech rate variation of a specific speaker influences the duration of phonemes. The models are evaluated in two ways. Firstly, the mean square error in phoneme duration based on our normalization is compared to the same error when such normalization is not applied. The second evaluation...
متن کاملSpectral Characteristics of Vocal Tract for Speaker Recognition
The basic idea of the presented approach is to evaluate a spectral characteristics corresponding to the anatomy of the speaker ́s vocal tract independently of the actually pronounced phoneme. The procedure for determining the speaker-specific average spectrum is based on the LPC approach. Experimental results have shown an evolution in a long-time spectrum with respect to the duration of text in...
متن کاملImpact of Vocal Tract Length Normalization on the Speech Recognition Performance of an English Vowel Phoneme Recognizer for the Recognition of Children Voices
Differences in human vocal tract lengths can cause inter speaker acoustic variability in speech signals spoken by different speakers for the same textual version and due to these variations, the robustness of a speaker independent (SI) speech recognition system is affected. Speaker normalization using vocal tract length normalization (VTLN) is an effective approach to reduce the affect of these...
متن کاملPhonetic and speaker variations in automatic emotion classification
The speech signal contains information that characterises the speaker and the phonetic content, together with the emotion being expressed. This paper looks at the effect of this speakerand phoneme-specific information on speech-based automatic emotion classification. The performances of a classification system using established acoustic and prosodic features for different phonemes are compared,...
متن کاملPhoneme background model for information bottleneck based speaker diarization
Acoustic variability of speakers arises due to differences in their vocal tract characteristics. These individual speaker characteristics are reflected in a speech signal when speakers pronounce a given phoneme. The current work hypothesizes that clusters within a phoneme spoken by multiple speakers roughly correspond to different speakers. Based on this hypothesis, a Gaussian mixture model (GM...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- South African Computer Journal
دوره 40 شماره
صفحات -
تاریخ انتشار 2008